Interactive Algorithms for Unsupervised Machine Learning

نویسندگان

  • Akshay Krishnamurthy
  • Maria Florina Balcan
  • Barnabás Poczós
  • Larry Wasserman
  • Sanjoy Dasgupta
چکیده

This thesis explores the power of interactivity in unsupervised machine learning problems. Interactive algorithms employ feedback driven measurements to mitigate the cost of data acquisition and consequently enable statistical analysis in otherwise intractable settings. Unsupervised learning methods are fundamental tools across a variety of domains, and interactive procedures promise to broaden the scope of statistical analysis. We develop interactive mechanisms and inference procedures for three unsupervised problems: subspace learning, clustering, and tree metric learning. Our theoretical and empirical analysis shows that interactivity can bring both statistical and computational improvements over non-interactive approaches. In addition, an over-arching thread of this thesis is that interactive learning is particularly powerful for non-uniform datasets, where non-uniformity is quantified differently in each setting. We first study the subspace learning problem, where the goal is to recover or approximate the principal subspace of a collection of partially observed data points. We propose statistically and computationally appealing interactive algorithms for both the matrix completion problem, where the data points lie in a low dimensional subspace, and the matrix approximation problem, where one must approximate the principal components of an arbitrary collection of points. We measure uniformity with the notion of incoherence, which is known to be necessary for non-interactive algorithms, and we show that our feedback-driven algorithms perform well under much milder incoherence assumptions. We next consider clustering a dataset represented by a partially observed similarity matrix. We propose an interactive procedure for recovering a hierarchical clustering from a small number of carefully selected similarity measurements. The algorithm exploits non-uniformity of cluster size by using few measurements to recover larger clusters and then focusing measurements on identifying the smaller structures. In addition to coming with strong statistical and computational guarantees, this algorithm performs well in practice. Finally we consider a specific metric learning problem, where we compute a latent tree metric to approximate distances over a point set. This problem is motivated by applications in network tomography, where the goal is to approximate the network structure using only measurements between pairs of end hosts. Our algorithms use an interactively chosen subset of the pairwise distances to learn the latent tree metric while being robust to either additive noise or a small number of arbitrarily corrupted distances. As before, we leverage non-uniformity inherent in the tree metric structure to achieve low sample complexity. Throughout we complement our theoretical results with empirical evaluations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Play: Machine Learning Toolkit for Max

Machine learning models are useful and attractive tools for the interactive computer musician, enabling a breadth of interfaces and instruments. With current consumer hardware it becomes possible to run advanced machine learning algorithms in demanding performance situations, yet expertise remains a prominent entry barrier for most would-be users. Currently available implementations predominant...

متن کامل

Unsupervised Text Classification for Natural Language Interactive Narratives

Natural language interactive narratives are a variant of traditional branching storylines where player actions are expressed in natural language rather than by selecting among choices. Previous efforts have handled the richness of natural language input using machine learning technologies for text classification, bootstrapping supervised machine learning approaches with human-in-the-loop data a...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Interactive and Incremental Learning via a Mixture of Supervised and Unsupervised Learning Strategies

Machine learning paradigms are generally separated into supervised learning and unsupervised learning. Both of these paradigms have their own advantages in practice. But existing algorithms of these two paradigms also expose some hard problems in many different applications. In this paper, we first analyze the general problems of these two paradigms, and some successful techniques for boosting ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014